Installation note: You may need to install Cairo on your operating system to run this notebook. See README for details.

if(!require(Cairo)) install.packages("Cairo", repos = "http://cran.us.r-project.org")
## Loading required package: Cairo

Introduction

The purpose of this project is to predict the price of houses in California in 1990 based on a number of possible location-based predictors, including latitude, longitude, and information about houses within a particular block.

While this project focuses on prediction we are fully aware and want you the reader to also be aware that housing prices increased incredibly after this time period, then the bubble burst for a while and housing prices increased again. This model should not be used to predict the actual future. This is a purely academic endeavor to explore statistical prediction.

The goal of the project is to create the model that can best predict home prices in California given reasonable test/train splits in the data.

Dataset

We’re using the California Housing Prices dataset from the following Kaggle site: https://www.kaggle.com/camnugent/california-housing-prices. This data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data.

We loaded housing.csv into R.

library(readr)
library(knitr)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
housing_data = read_csv("housing.csv")
## Parsed with column specification:
## cols(
##   longitude = col_double(),
##   latitude = col_double(),
##   housing_median_age = col_double(),
##   total_rooms = col_double(),
##   total_bedrooms = col_double(),
##   population = col_double(),
##   households = col_double(),
##   median_income = col_double(),
##   median_house_value = col_double(),
##   ocean_proximity = col_character()
## )
housing_data$median_house_value[1:100]
##   [1] 452600 358500 352100 341300 342200 269700 299200 241400 226700 261100
##  [11] 281500 241800 213500 191300 159200 140000 152500 155500 158700 162900
##  [21] 147500 159800 113900  99700 132600 107500  93800 105500 108900 132000
##  [31] 122300 115200 110400 104900 109700  97200 104500 103900 191400 176000
##  [41] 155400 150000 118800 188800 184400 182300 142500 137500 187500 112500
##  [51] 171900  93800  97500 104200  87500  83100  87500  85300  80300  60000
##  [61]  75700  75000  86100  76100  73500  78400  84400  81300  85000 129200
##  [71]  82500  95200  75000  67500 137500 177500 102100 108300 112500 131300
##  [81] 162500 112500 112500 137500 118800  98200 118800 162500 137500 500001
##  [91] 162500 137500 162500 187500 179200 130000 183800 125000 170000 193100

The dataset contains 20640 observations and 10 attributes (9 predictors and 1 response). Below is a list of the variables with descriptions taken from the original Kaggle site given above.

  • longitude: A measure of how far west a house is; a higher value is farther west
  • latitude: A measure of how far north a house is; a higher value is farther north
  • housingMedianAge: Median age of a house within a block; a lower number is a newer building
  • totalRooms: Total number of rooms within a block
  • totalBedrooms: Total number of bedrooms within a block
  • population: Total number of people residing within a block
  • households: Total number of households, a group of people residing within a home unit, for a block
  • medianIncome: Median income for households within a block of houses (measured in tens of thousands of US Dollars)
  • oceanProximity: Location of the house w.r.t ocean/sea
  • medianHouseValue: Median house value for households within a block (measured in US Dollars)

This dataset meets all of the stated criteria for the project including:

  • A minimum 200 observations
  • A numeric response variable - median_house_value
  • At least one categorical predictor- oceanProximity
  • At least two numeric predictors - the remaining attributes

Let’s look at a summary of each column.

summary(housing_data)#gives us a summary of each column. Note that total bedrooms has 207 NA's. We will need to impute these values
##    longitude         latitude     housing_median_age  total_rooms   
##  Min.   :-124.3   Min.   :32.54   Min.   : 1.00      Min.   :    2  
##  1st Qu.:-121.8   1st Qu.:33.93   1st Qu.:18.00      1st Qu.: 1448  
##  Median :-118.5   Median :34.26   Median :29.00      Median : 2127  
##  Mean   :-119.6   Mean   :35.63   Mean   :28.64      Mean   : 2636  
##  3rd Qu.:-118.0   3rd Qu.:37.71   3rd Qu.:37.00      3rd Qu.: 3148  
##  Max.   :-114.3   Max.   :41.95   Max.   :52.00      Max.   :39320  
##                                                                     
##  total_bedrooms     population      households     median_income    
##  Min.   :   1.0   Min.   :    3   Min.   :   1.0   Min.   : 0.4999  
##  1st Qu.: 296.0   1st Qu.:  787   1st Qu.: 280.0   1st Qu.: 2.5634  
##  Median : 435.0   Median : 1166   Median : 409.0   Median : 3.5348  
##  Mean   : 537.9   Mean   : 1425   Mean   : 499.5   Mean   : 3.8707  
##  3rd Qu.: 647.0   3rd Qu.: 1725   3rd Qu.: 605.0   3rd Qu.: 4.7432  
##  Max.   :6445.0   Max.   :35682   Max.   :6082.0   Max.   :15.0001  
##  NA's   :207                                                        
##  median_house_value ocean_proximity   
##  Min.   : 14999     Length:20640      
##  1st Qu.:119600     Class :character  
##  Median :179700     Mode  :character  
##  Mean   :206856                       
##  3rd Qu.:264725                       
##  Max.   :500001                       
## 

Data Cleaning

Initial exploration of the data showed us that there were a few steps we needed to take to make the data more useable. Firstly, we changed the categorical variable oceanProximity from text-based to a factor variable.

housing_data$ocean_proximity = as.factor(housing_data$ocean_proximity)
ocean_proximity = housing_data$ocean_proximity

We see that the factor variable oceanProximity has the following \(5\) levels: \(<1H OCEAN, INLAND, ISLAND, NEAR BAY, NEAR OCEAN\).

The other thing to consider is missing data.

sum(is.na(housing_data))
## [1] 207
total_bedrooms = housing_data$total_bedrooms
sum(is.na(total_bedrooms))
## [1] 207

There are \(207\) observations with missing data for total_bedrooms. We’ll need to figure out how to handle this missing data. However, looking at the relationship between total_bedrooms and total_rooms, it looks possible that this is collinearity and we won’t gain any information by using total_bedrooms variable in our model. Further testing is required before we can make this decision.

plot(housing_data$total_bedrooms ~ housing_data$total_rooms, col="dodgerblue")

Other possible things we could do is to fill in the missing total_bedrooms data with the median value of total_bedrooms grouped by total_rooms, since there is a relationship.

library(tidyverse)
housing_data %>% 
  group_by(total_rooms) %>% 
  summarize(median.total_bedrooms = median(total_bedrooms, na.rm = TRUE))
## # A tibble: 5,926 x 2
##    total_rooms median.total_bedrooms
##          <dbl>                 <dbl>
##  1           2                   2  
##  2           6                   2  
##  3           8                   1  
##  4          11                  11  
##  5          12                   4  
##  6          15                   4  
##  7          16                   4  
##  8          18                   3.5
##  9          19                  12  
## 10          20                   4.5
## # … with 5,916 more rows

Looking at the structure of the dataset after this clean up, we see that besides the one factor variable ocean_proximity, we are left with nine numeric variables, three of which are continuous (longitude, latitude, and median_income) and six of which are discrete (housing_median_age, total_rooms, total_bedrooms, population, households, and median_house_value).

str(housing_data)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 20640 obs. of  10 variables:
##  $ longitude         : num  -122 -122 -122 -122 -122 ...
##  $ latitude          : num  37.9 37.9 37.9 37.9 37.9 ...
##  $ housing_median_age: num  41 21 52 52 52 52 52 52 42 52 ...
##  $ total_rooms       : num  880 7099 1467 1274 1627 ...
##  $ total_bedrooms    : num  129 1106 190 235 280 ...
##  $ population        : num  322 2401 496 558 565 ...
##  $ households        : num  126 1138 177 219 259 ...
##  $ median_income     : num  8.33 8.3 7.26 5.64 3.85 ...
##  $ median_house_value: num  452600 358500 352100 341300 342200 ...
##  $ ocean_proximity   : Factor w/ 5 levels "<1H OCEAN","INLAND",..: 4 4 4 4 4 4 4 4 4 4 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   longitude = col_double(),
##   ..   latitude = col_double(),
##   ..   housing_median_age = col_double(),
##   ..   total_rooms = col_double(),
##   ..   total_bedrooms = col_double(),
##   ..   population = col_double(),
##   ..   households = col_double(),
##   ..   median_income = col_double(),
##   ..   median_house_value = col_double(),
##   ..   ocean_proximity = col_character()
##   .. )

Let’s look a bit more closely at the distribution of the numeric variables.

par(mfrow = c(3, 3))
hist(housing_data$longitude, breaks = 20, main = "longitude", border="darkorange", col="dodgerblue")
hist(housing_data$latitude, breaks = 20, main = "latitude", border="darkorange", col="dodgerblue")
hist(housing_data$housing_median_age, breaks = 20, main = "housing_median_age", border="darkorange", col="dodgerblue")
hist(housing_data$total_rooms, breaks = 20, main = "total_rooms", border="darkorange", col="dodgerblue")
hist(housing_data$total_bedrooms, breaks = 20, main = "total_bedrooms", border="darkorange", col="dodgerblue")
hist(housing_data$population, breaks = 20, main = "population", border="darkorange", col="dodgerblue")
hist(housing_data$households, breaks = 20, main = "households", border="darkorange", col="dodgerblue")
hist(housing_data$median_income, breaks = 20, main = "median_income", border="darkorange", col="dodgerblue")
hist(housing_data$median_house_value, breaks = 20, main = "median_house_value", border="darkorange", col="dodgerblue")

And let’s look at the relationships between all the possible variables.

pairs(housing_data, col = "dodgerblue")

In addition to the already mentioned linear relationship between total rooms and total bedrooms, we will also need to look into potential coliearity of households and total bedrooms (and potentially total rooms).

library(ggplot2)
#we want to look at shape of distribution to get a good idea of what to impute
ggplot(housing_data, aes(x = total_bedrooms)) +
  geom_histogram(bins = 40) +
  xlab("Total Bedrooms") +
  ylab("Density") +
  ggtitle("Histogram of Total Bedrooms (noncontinuous variable)")
## Warning: Removed 207 rows containing non-finite values (stat_bin).

#using mean for now
library(mice)
## 
## Attaching package: 'mice'
## The following object is masked from 'package:tidyr':
## 
##     complete
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
housing_data_temp = mice(data = housing_data, m = 5, method = "mean", seed = 420)
## 
##  iter imp variable
##   1   1  total_bedrooms
##   1   2  total_bedrooms
##   1   3  total_bedrooms
##   1   4  total_bedrooms
##   1   5  total_bedrooms
##   2   1  total_bedrooms
##   2   2  total_bedrooms
##   2   3  total_bedrooms
##   2   4  total_bedrooms
##   2   5  total_bedrooms
##   3   1  total_bedrooms
##   3   2  total_bedrooms
##   3   3  total_bedrooms
##   3   4  total_bedrooms
##   3   5  total_bedrooms
##   4   1  total_bedrooms
##   4   2  total_bedrooms
##   4   3  total_bedrooms
##   4   4  total_bedrooms
##   4   5  total_bedrooms
##   5   1  total_bedrooms
##   5   2  total_bedrooms
##   5   3  total_bedrooms
##   5   4  total_bedrooms
##   5   5  total_bedrooms
housing_data_full  = complete(housing_data_temp, 1)
housing_data_full$ocean_proximity = as.factor(housing_data_full$ocean_proximity)
housing_data_nc = housing_data_full[, -10]#remove text variable for now

corrmatrix = cor(housing_data_nc)

kable(t(corrmatrix))
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value
longitude 1.0000000 -0.9246644 -0.1081968 0.0445680 0.0692597 0.0997732 0.0553101 -0.0151759 -0.0459666
latitude -0.9246644 1.0000000 0.0111727 -0.0360996 -0.0666584 -0.1087847 -0.0710354 -0.0798091 -0.1441603
housing_median_age -0.1081968 0.0111727 1.0000000 -0.3612622 -0.3189983 -0.2962442 -0.3029160 -0.1190340 0.1056234
total_rooms 0.0445680 -0.0360996 -0.3612622 1.0000000 0.9272527 0.8571260 0.9184845 0.1980496 0.1341531
total_bedrooms 0.0692597 -0.0666584 -0.3189983 0.9272527 1.0000000 0.8739095 0.9747249 -0.0076819 0.0494535
population 0.0997732 -0.1087847 -0.2962442 0.8571260 0.8739095 1.0000000 0.9072223 0.0048343 -0.0246497
households 0.0553101 -0.0710354 -0.3029160 0.9184845 0.9747249 0.9072223 1.0000000 0.0130331 0.0658427
median_income -0.0151759 -0.0798091 -0.1190340 0.1980496 -0.0076819 0.0048343 0.0130331 1.0000000 0.6880752
median_house_value -0.0459666 -0.1441603 0.1056234 0.1341531 0.0494535 -0.0246497 0.0658427 0.6880752 1.0000000
highcorr = findCorrelation(corrmatrix, cutoff = .60)#this will give you highly correlated variables
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(RColorBrewer)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_map = ggplot(housing_data_full, 
                  aes(x = longitude, y = latitude, color = median_house_value, hma = housing_median_age,
                      tr = total_rooms, tb = total_bedrooms, hh = households, mi = median_income)) +
              geom_point(aes(size = population), alpha = 0.4) +
              xlab("Longitude") +
              ylab("Latitude") +
              ggtitle("Data Map - Longtitude vs Latitude and Associated Variables") +
              theme(plot.title = element_text(hjust = 0.5)) +
              scale_color_distiller(palette = "Paired", labels = comma) +
              labs(color = "Median House Value (in $USD)", size = "Population")
plot_map_tt = ggplotly(plot_map)

plot_map_tt
temp_housing_data = housing_data_full[housing_data_full$ocean_proximity != "ISLAND", ] #possibly consider removing #ISLAND promiximity homes. There are 5 in the dataset out of 20k observations.

start_mod = lm(median_house_value ~ (.)^2, data = temp_housing_data)
 
n = length(resid(start_mod))
back_bic_mod = step(start_mod, direction = "backward", k = log(n))
## Start:  AIC=456726.7
## median_house_value ~ (longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity)^2
## 
##                                      Df  Sum of Sq        RSS    AIC
## - total_bedrooms:ocean_proximity      3 3.3966e+10 8.2016e+13 456705
## - total_rooms:total_bedrooms          1 9.1245e+07 8.1982e+13 456717
## - housing_median_age:total_bedrooms   1 9.9335e+08 8.1983e+13 456717
## - households:ocean_proximity          3 8.0876e+10 8.2063e+13 456717
## - total_bedrooms:median_income        1 2.3709e+09 8.1985e+13 456717
## - longitude:households                1 1.8557e+10 8.2001e+13 456721
## - total_bedrooms:population           1 2.3950e+10 8.2006e+13 456723
## - latitude:households                 1 2.6834e+10 8.2009e+13 456723
## - longitude:population                1 2.7677e+10 8.2010e+13 456724
## <none>                                             8.1982e+13 456727
## - households:median_income            1 4.3377e+10 8.2026e+13 456728
## - latitude:population                 1 5.3803e+10 8.2036e+13 456730
## - longitude:total_bedrooms            1 6.1010e+10 8.2043e+13 456732
## - population:median_income            1 6.4331e+10 8.2047e+13 456733
## - total_rooms:ocean_proximity         3 1.4739e+11 8.2130e+13 456734
## - housing_median_age:total_rooms      1 7.6819e+10 8.2059e+13 456736
## - housing_median_age:median_income    1 8.7309e+10 8.2070e+13 456739
## - total_rooms:households              1 9.1399e+10 8.2074e+13 456740
## - population:households               1 9.2161e+10 8.2074e+13 456740
## - latitude:total_bedrooms             1 9.8267e+10 8.2080e+13 456741
## - housing_median_age:ocean_proximity  3 1.8076e+11 8.2163e+13 456742
## - longitude:total_rooms               1 1.2431e+11 8.2107e+13 456748
## - total_rooms:median_income           1 1.3340e+11 8.2116e+13 456750
## - latitude:total_rooms                1 1.5767e+11 8.2140e+13 456756
## - total_rooms:population              1 1.8232e+11 8.2165e+13 456763
## - median_income:ocean_proximity       3 3.1068e+11 8.2293e+13 456775
## - population:ocean_proximity          3 3.8433e+11 8.2367e+13 456793
## - longitude:latitude                  1 3.9347e+11 8.2376e+13 456816
## - total_bedrooms:households           1 3.9597e+11 8.2378e+13 456816
## - housing_median_age:households       1 4.2291e+11 8.2405e+13 456823
## - longitude:housing_median_age        1 4.5976e+11 8.2442e+13 456832
## - latitude:housing_median_age         1 6.3099e+11 8.2613e+13 456875
## - longitude:median_income             1 6.5260e+11 8.2635e+13 456880
## - latitude:median_income              1 7.2664e+11 8.2709e+13 456899
## - housing_median_age:population       1 9.7874e+11 8.2961e+13 456962
## - latitude:ocean_proximity            3 1.2166e+12 8.3199e+13 457001
## - longitude:ocean_proximity           3 2.1565e+12 8.4139e+13 457233
## 
## Step:  AIC=456705.4
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:population + 
##     longitude:households + longitude:median_income + longitude:ocean_proximity + 
##     latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms + 
##     latitude:population + latitude:households + latitude:median_income + 
##     latitude:ocean_proximity + housing_median_age:total_rooms + 
##     housing_median_age:total_bedrooms + housing_median_age:population + 
##     housing_median_age:households + housing_median_age:median_income + 
##     housing_median_age:ocean_proximity + total_rooms:total_bedrooms + 
##     total_rooms:population + total_rooms:households + total_rooms:median_income + 
##     total_rooms:ocean_proximity + total_bedrooms:population + 
##     total_bedrooms:households + total_bedrooms:median_income + 
##     population:households + population:median_income + population:ocean_proximity + 
##     households:median_income + households:ocean_proximity + median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - total_rooms:total_bedrooms          1 1.1063e+09 8.2017e+13 456696
## - housing_median_age:total_bedrooms   1 2.1540e+09 8.2018e+13 456696
## - total_bedrooms:median_income        1 2.9676e+09 8.2019e+13 456696
## - households:ocean_proximity          3 8.7377e+10 8.2104e+13 456698
## - total_bedrooms:population           1 1.3989e+10 8.2030e+13 456699
## - longitude:households                1 2.6159e+10 8.2042e+13 456702
## - longitude:population                1 2.7868e+10 8.2044e+13 456702
## - latitude:households                 1 3.7451e+10 8.2054e+13 456705
## <none>                                             8.2016e+13 456705
## - households:median_income            1 4.5228e+10 8.2061e+13 456707
## - latitude:population                 1 5.5671e+10 8.2072e+13 456709
## - population:median_income            1 6.4610e+10 8.2081e+13 456712
## - housing_median_age:total_rooms      1 7.3952e+10 8.2090e+13 456714
## - total_rooms:ocean_proximity         3 1.6112e+11 8.2177e+13 456716
## - housing_median_age:median_income    1 8.3276e+10 8.2099e+13 456716
## - total_rooms:households              1 8.8196e+10 8.2104e+13 456718
## - housing_median_age:ocean_proximity  3 1.7426e+11 8.2190e+13 456719
## - longitude:total_bedrooms            1 9.5540e+10 8.2112e+13 456720
## - longitude:total_rooms               1 1.3119e+11 8.2147e+13 456728
## - total_rooms:median_income           1 1.3250e+11 8.2149e+13 456729
## - population:households               1 1.4602e+11 8.2162e+13 456732
## - latitude:total_bedrooms             1 1.5925e+11 8.2175e+13 456736
## - latitude:total_rooms                1 1.7523e+11 8.2191e+13 456740
## - total_rooms:population              1 1.8910e+11 8.2205e+13 456743
## - median_income:ocean_proximity       3 3.0915e+11 8.2325e+13 456753
## - population:ocean_proximity          3 3.6036e+11 8.2377e+13 456766
## - longitude:latitude                  1 3.9532e+11 8.2412e+13 456795
## - total_bedrooms:households           1 4.1177e+11 8.2428e+13 456799
## - housing_median_age:households       1 4.5132e+11 8.2468e+13 456809
## - longitude:housing_median_age        1 4.6022e+11 8.2476e+13 456811
## - latitude:housing_median_age         1 6.3264e+11 8.2649e+13 456854
## - longitude:median_income             1 6.8519e+11 8.2701e+13 456867
## - latitude:median_income              1 7.6876e+11 8.2785e+13 456888
## - housing_median_age:population       1 9.8590e+11 8.3002e+13 456942
## - latitude:ocean_proximity            3 1.2232e+12 8.3239e+13 456981
## - longitude:ocean_proximity           3 2.1653e+12 8.4181e+13 457213
## 
## Step:  AIC=456695.8
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:population + 
##     longitude:households + longitude:median_income + longitude:ocean_proximity + 
##     latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms + 
##     latitude:population + latitude:households + latitude:median_income + 
##     latitude:ocean_proximity + housing_median_age:total_rooms + 
##     housing_median_age:total_bedrooms + housing_median_age:population + 
##     housing_median_age:households + housing_median_age:median_income + 
##     housing_median_age:ocean_proximity + total_rooms:population + 
##     total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity + 
##     total_bedrooms:population + total_bedrooms:households + total_bedrooms:median_income + 
##     population:households + population:median_income + population:ocean_proximity + 
##     households:median_income + households:ocean_proximity + median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - total_bedrooms:median_income        1 2.5553e+09 8.2020e+13 456686
## - housing_median_age:total_bedrooms   1 3.1234e+09 8.2020e+13 456687
## - households:ocean_proximity          3 8.7733e+10 8.2105e+13 456688
## - total_bedrooms:population           1 2.5262e+10 8.2043e+13 456692
## - longitude:population                1 2.8284e+10 8.2046e+13 456693
## - longitude:households                1 3.1457e+10 8.2049e+13 456694
## <none>                                             8.2017e+13 456696
## - latitude:households                 1 4.0683e+10 8.2058e+13 456696
## - households:median_income            1 4.4212e+10 8.2062e+13 456697
## - latitude:population                 1 5.6079e+10 8.2073e+13 456700
## - population:median_income            1 6.5561e+10 8.2083e+13 456702
## - housing_median_age:total_rooms      1 7.4381e+10 8.2092e+13 456705
## - total_rooms:ocean_proximity         3 1.6041e+11 8.2178e+13 456706
## - housing_median_age:median_income    1 8.3430e+10 8.2101e+13 456707
## - housing_median_age:ocean_proximity  3 1.7407e+11 8.2191e+13 456710
## - longitude:total_bedrooms            1 9.5982e+10 8.2113e+13 456710
## - total_rooms:median_income           1 1.3217e+11 8.2149e+13 456719
## - longitude:total_rooms               1 1.3427e+11 8.2152e+13 456720
## - latitude:total_bedrooms             1 1.5834e+11 8.2176e+13 456726
## - total_rooms:households              1 1.7352e+11 8.2191e+13 456729
## - latitude:total_rooms                1 1.7765e+11 8.2195e+13 456730
## - total_rooms:population              1 1.8828e+11 8.2206e+13 456733
## - median_income:ocean_proximity       3 3.0811e+11 8.2325e+13 456743
## - population:households               1 2.5710e+11 8.2274e+13 456750
## - population:ocean_proximity          3 3.5958e+11 8.2377e+13 456756
## - longitude:latitude                  1 3.9452e+11 8.2412e+13 456785
## - total_bedrooms:households           1 4.1068e+11 8.2428e+13 456789
## - longitude:housing_median_age        1 4.5931e+11 8.2477e+13 456801
## - housing_median_age:households       1 4.8648e+11 8.2504e+13 456808
## - latitude:housing_median_age         1 6.3165e+11 8.2649e+13 456844
## - longitude:median_income             1 6.8719e+11 8.2704e+13 456858
## - latitude:median_income              1 7.7050e+11 8.2788e+13 456879
## - housing_median_age:population       1 9.8483e+11 8.3002e+13 456932
## - latitude:ocean_proximity            3 1.2224e+12 8.3240e+13 456971
## - longitude:ocean_proximity           3 2.1642e+12 8.4181e+13 457203
## 
## Step:  AIC=456686.5
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:population + 
##     longitude:households + longitude:median_income + longitude:ocean_proximity + 
##     latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms + 
##     latitude:population + latitude:households + latitude:median_income + 
##     latitude:ocean_proximity + housing_median_age:total_rooms + 
##     housing_median_age:total_bedrooms + housing_median_age:population + 
##     housing_median_age:households + housing_median_age:median_income + 
##     housing_median_age:ocean_proximity + total_rooms:population + 
##     total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity + 
##     total_bedrooms:population + total_bedrooms:households + population:households + 
##     population:median_income + population:ocean_proximity + households:median_income + 
##     households:ocean_proximity + median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - housing_median_age:total_bedrooms   1 2.0299e+09 8.2022e+13 456677
## - households:ocean_proximity          3 8.7069e+10 8.2107e+13 456679
## - total_bedrooms:population           1 2.7022e+10 8.2047e+13 456683
## - longitude:population                1 2.8545e+10 8.2048e+13 456684
## - longitude:households                1 3.3347e+10 8.2053e+13 456685
## <none>                                             8.2020e+13 456686
## - latitude:households                 1 4.3781e+10 8.2064e+13 456688
## - latitude:population                 1 5.6298e+10 8.2076e+13 456691
## - population:median_income            1 6.3082e+10 8.2083e+13 456692
## - housing_median_age:total_rooms      1 7.3394e+10 8.2093e+13 456695
## - households:median_income            1 7.5800e+10 8.2096e+13 456696
## - total_rooms:ocean_proximity         3 1.5887e+11 8.2179e+13 456697
## - housing_median_age:median_income    1 8.3562e+10 8.2103e+13 456698
## - housing_median_age:ocean_proximity  3 1.7339e+11 8.2193e+13 456700
## - longitude:total_bedrooms            1 9.4292e+10 8.2114e+13 456700
## - longitude:total_rooms               1 1.3173e+11 8.2152e+13 456710
## - total_rooms:median_income           1 1.3243e+11 8.2152e+13 456710
## - latitude:total_bedrooms             1 1.5895e+11 8.2179e+13 456716
## - total_rooms:households              1 1.7110e+11 8.2191e+13 456720
## - latitude:total_rooms                1 1.7511e+11 8.2195e+13 456721
## - total_rooms:population              1 1.8608e+11 8.2206e+13 456723
## - median_income:ocean_proximity       3 3.0580e+11 8.2326e+13 456733
## - population:households               1 2.5813e+11 8.2278e+13 456741
## - population:ocean_proximity          3 3.5744e+11 8.2377e+13 456746
## - longitude:latitude                  1 3.9378e+11 8.2414e+13 456775
## - total_bedrooms:households           1 4.0815e+11 8.2428e+13 456779
## - longitude:housing_median_age        1 4.6132e+11 8.2481e+13 456792
## - housing_median_age:households       1 4.9580e+11 8.2516e+13 456801
## - latitude:housing_median_age         1 6.3387e+11 8.2654e+13 456835
## - longitude:median_income             1 6.9069e+11 8.2711e+13 456850
## - latitude:median_income              1 7.7281e+11 8.2793e+13 456870
## - housing_median_age:population       1 9.8521e+11 8.3005e+13 456923
## - latitude:ocean_proximity            3 1.2204e+12 8.3240e+13 456961
## - longitude:ocean_proximity           3 2.1626e+12 8.4182e+13 457194
## 
## Step:  AIC=456677
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:population + 
##     longitude:households + longitude:median_income + longitude:ocean_proximity + 
##     latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms + 
##     latitude:population + latitude:households + latitude:median_income + 
##     latitude:ocean_proximity + housing_median_age:total_rooms + 
##     housing_median_age:population + housing_median_age:households + 
##     housing_median_age:median_income + housing_median_age:ocean_proximity + 
##     total_rooms:population + total_rooms:households + total_rooms:median_income + 
##     total_rooms:ocean_proximity + total_bedrooms:population + 
##     total_bedrooms:households + population:households + population:median_income + 
##     population:ocean_proximity + households:median_income + households:ocean_proximity + 
##     median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - households:ocean_proximity          3 8.6819e+10 8.2109e+13 456669
## - longitude:population                1 2.8898e+10 8.2051e+13 456674
## - total_bedrooms:population           1 3.2757e+10 8.2055e+13 456675
## - longitude:households                1 3.7782e+10 8.2060e+13 456677
## <none>                                             8.2022e+13 456677
## - latitude:households                 1 4.7102e+10 8.2069e+13 456679
## - latitude:population                 1 5.6585e+10 8.2078e+13 456681
## - population:median_income            1 6.5167e+10 8.2087e+13 456684
## - households:median_income            1 7.6714e+10 8.2099e+13 456686
## - total_rooms:ocean_proximity         3 1.5800e+11 8.2180e+13 456687
## - housing_median_age:total_rooms      1 9.0793e+10 8.2113e+13 456690
## - longitude:total_bedrooms            1 9.3476e+10 8.2115e+13 456691
## - housing_median_age:median_income    1 9.4407e+10 8.2116e+13 456691
## - housing_median_age:ocean_proximity  3 1.7542e+11 8.2197e+13 456691
## - longitude:total_rooms               1 1.3060e+11 8.2152e+13 456700
## - total_rooms:median_income           1 1.3267e+11 8.2155e+13 456700
## - latitude:total_bedrooms             1 1.5781e+11 8.2180e+13 456707
## - total_rooms:households              1 1.6984e+11 8.2192e+13 456710
## - latitude:total_rooms                1 1.7370e+11 8.2196e+13 456711
## - total_rooms:population              1 1.8733e+11 8.2209e+13 456714
## - median_income:ocean_proximity       3 3.0455e+11 8.2326e+13 456724
## - population:households               1 2.7241e+11 8.2294e+13 456736
## - population:ocean_proximity          3 3.5668e+11 8.2379e+13 456737
## - longitude:latitude                  1 3.9416e+11 8.2416e+13 456766
## - total_bedrooms:households           1 4.0745e+11 8.2429e+13 456769
## - longitude:housing_median_age        1 4.6683e+11 8.2489e+13 456784
## - latitude:housing_median_age         1 6.3894e+11 8.2661e+13 456827
## - longitude:median_income             1 6.8874e+11 8.2711e+13 456840
## - latitude:median_income              1 7.7086e+11 8.2793e+13 456860
## - housing_median_age:population       1 1.0009e+12 8.3023e+13 456917
## - latitude:ocean_proximity            3 1.2208e+12 8.3243e+13 456952
## - housing_median_age:households       1 1.2775e+12 8.3299e+13 456986
## - longitude:ocean_proximity           3 2.1610e+12 8.4183e+13 457184
## 
## Step:  AIC=456669.1
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:population + 
##     longitude:households + longitude:median_income + longitude:ocean_proximity + 
##     latitude:housing_median_age + latitude:total_rooms + latitude:total_bedrooms + 
##     latitude:population + latitude:households + latitude:median_income + 
##     latitude:ocean_proximity + housing_median_age:total_rooms + 
##     housing_median_age:population + housing_median_age:households + 
##     housing_median_age:median_income + housing_median_age:ocean_proximity + 
##     total_rooms:population + total_rooms:households + total_rooms:median_income + 
##     total_rooms:ocean_proximity + total_bedrooms:population + 
##     total_bedrooms:households + population:households + population:median_income + 
##     population:ocean_proximity + households:median_income + median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - longitude:population                1 1.7928e+10 8.2127e+13 456664
## - longitude:households                1 2.4690e+10 8.2133e+13 456665
## - total_bedrooms:population           1 3.2510e+10 8.2141e+13 456667
## <none>                                             8.2109e+13 456669
## - latitude:households                 1 4.4200e+10 8.2153e+13 456670
## - latitude:population                 1 4.8712e+10 8.2157e+13 456671
## - population:median_income            1 7.4432e+10 8.2183e+13 456678
## - longitude:total_bedrooms            1 8.3137e+10 8.2192e+13 456680
## - households:median_income            1 8.7278e+10 8.2196e+13 456681
## - housing_median_age:total_rooms      1 8.8601e+10 8.2197e+13 456681
## - housing_median_age:median_income    1 9.5562e+10 8.2204e+13 456683
## - housing_median_age:ocean_proximity  3 1.8170e+11 8.2290e+13 456685
## - longitude:total_rooms               1 1.2967e+11 8.2238e+13 456692
## - total_rooms:median_income           1 1.3273e+11 8.2241e+13 456692
## - latitude:total_bedrooms             1 1.5242e+11 8.2261e+13 456697
## - total_rooms:households              1 1.6981e+11 8.2279e+13 456702
## - total_rooms:population              1 1.9097e+11 8.2300e+13 456707
## - latitude:total_rooms                1 1.9999e+11 8.2309e+13 456709
## - total_rooms:ocean_proximity         3 2.8796e+11 8.2397e+13 456712
## - population:households               1 2.7898e+11 8.2388e+13 456729
## - population:ocean_proximity          3 4.0867e+11 8.2517e+13 456742
## - median_income:ocean_proximity       3 4.1706e+11 8.2526e+13 456744
## - longitude:latitude                  1 3.8852e+11 8.2497e+13 456757
## - total_bedrooms:households           1 4.0019e+11 8.2509e+13 456759
## - longitude:housing_median_age        1 4.7191e+11 8.2581e+13 456777
## - latitude:housing_median_age         1 6.4854e+11 8.2757e+13 456821
## - longitude:median_income             1 6.9310e+11 8.2802e+13 456833
## - latitude:median_income              1 7.9933e+11 8.2908e+13 456859
## - housing_median_age:population       1 1.0055e+12 8.3114e+13 456910
## - latitude:ocean_proximity            3 1.2363e+12 8.3345e+13 456948
## - housing_median_age:households       1 1.3227e+12 8.3431e+13 456989
## - longitude:ocean_proximity           3 2.1824e+12 8.4291e+13 457181
## 
## Step:  AIC=456663.7
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:households + 
##     longitude:median_income + longitude:ocean_proximity + latitude:housing_median_age + 
##     latitude:total_rooms + latitude:total_bedrooms + latitude:population + 
##     latitude:households + latitude:median_income + latitude:ocean_proximity + 
##     housing_median_age:total_rooms + housing_median_age:population + 
##     housing_median_age:households + housing_median_age:median_income + 
##     housing_median_age:ocean_proximity + total_rooms:population + 
##     total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity + 
##     total_bedrooms:population + total_bedrooms:households + population:households + 
##     population:median_income + population:ocean_proximity + households:median_income + 
##     median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - longitude:households                1 1.0557e+10 8.2137e+13 456656
## - total_bedrooms:population           1 2.5196e+10 8.2152e+13 456660
## - latitude:households                 1 2.6992e+10 8.2154e+13 456660
## <none>                                             8.2127e+13 456664
## - population:median_income            1 8.2988e+10 8.2210e+13 456675
## - latitude:population                 1 8.6665e+10 8.2213e+13 456675
## - housing_median_age:total_rooms      1 9.4969e+10 8.2222e+13 456678
## - households:median_income            1 9.7593e+10 8.2224e+13 456678
## - housing_median_age:median_income    1 1.0014e+11 8.2227e+13 456679
## - housing_median_age:ocean_proximity  3 1.8124e+11 8.2308e+13 456679
## - longitude:total_bedrooms            1 1.0440e+11 8.2231e+13 456680
## - total_rooms:median_income           1 1.3046e+11 8.2257e+13 456686
## - total_rooms:households              1 1.5955e+11 8.2286e+13 456694
## - total_rooms:population              1 1.7940e+11 8.2306e+13 456699
## - latitude:total_bedrooms             1 1.8561e+11 8.2312e+13 456700
## - longitude:total_rooms               1 2.0231e+11 8.2329e+13 456704
## - latitude:total_rooms                1 2.9682e+11 8.2423e+13 456728
## - population:households               1 2.9729e+11 8.2424e+13 456728
## - median_income:ocean_proximity       3 4.2540e+11 8.2552e+13 456740
## - total_bedrooms:households           1 3.8552e+11 8.2512e+13 456750
## - longitude:latitude                  1 3.8593e+11 8.2513e+13 456750
## - total_rooms:ocean_proximity         3 4.8757e+11 8.2614e+13 456756
## - longitude:housing_median_age        1 4.6704e+11 8.2594e+13 456771
## - population:ocean_proximity          3 7.1419e+11 8.2841e+13 456813
## - latitude:housing_median_age         1 6.4266e+11 8.2769e+13 456815
## - longitude:median_income             1 7.1390e+11 8.2841e+13 456832
## - latitude:median_income              1 8.1651e+11 8.2943e+13 456858
## - housing_median_age:population       1 1.0090e+12 8.3136e+13 456906
## - latitude:ocean_proximity            3 1.2450e+12 8.3372e+13 456944
## - housing_median_age:households       1 1.3474e+12 8.3474e+13 456990
## - longitude:ocean_proximity           3 2.1964e+12 8.4323e+13 457178
## 
## Step:  AIC=456656.4
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:median_income + 
##     longitude:ocean_proximity + latitude:housing_median_age + 
##     latitude:total_rooms + latitude:total_bedrooms + latitude:population + 
##     latitude:households + latitude:median_income + latitude:ocean_proximity + 
##     housing_median_age:total_rooms + housing_median_age:population + 
##     housing_median_age:households + housing_median_age:median_income + 
##     housing_median_age:ocean_proximity + total_rooms:population + 
##     total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity + 
##     total_bedrooms:population + total_bedrooms:households + population:households + 
##     population:median_income + population:ocean_proximity + households:median_income + 
##     median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - total_bedrooms:population           1 1.9534e+10 8.2157e+13 456651
## - latitude:households                 1 2.1573e+10 8.2159e+13 456652
## <none>                                             8.2137e+13 456656
## - population:median_income            1 8.4172e+10 8.2221e+13 456668
## - latitude:population                 1 9.8545e+10 8.2236e+13 456671
## - households:median_income            1 1.0403e+11 8.2241e+13 456673
## - housing_median_age:ocean_proximity  3 1.8748e+11 8.2325e+13 456674
## - housing_median_age:median_income    1 1.0984e+11 8.2247e+13 456674
## - housing_median_age:total_rooms      1 1.1277e+11 8.2250e+13 456675
## - total_rooms:median_income           1 1.2353e+11 8.2261e+13 456677
## - total_rooms:households              1 1.6280e+11 8.2300e+13 456687
## - longitude:total_bedrooms            1 1.7721e+11 8.2314e+13 456691
## - total_rooms:population              1 1.8512e+11 8.2322e+13 456693
## - longitude:total_rooms               1 1.9234e+11 8.2330e+13 456695
## - latitude:total_bedrooms             1 2.5537e+11 8.2393e+13 456710
## - latitude:total_rooms                1 2.8851e+11 8.2426e+13 456719
## - median_income:ocean_proximity       3 4.2590e+11 8.2563e+13 456733
## - total_bedrooms:households           1 3.8589e+11 8.2523e+13 456743
## - longitude:latitude                  1 3.9955e+11 8.2537e+13 456747
## - total_rooms:ocean_proximity         3 4.7908e+11 8.2616e+13 456747
## - population:households               1 4.0817e+11 8.2545e+13 456749
## - longitude:housing_median_age        1 4.6702e+11 8.2604e+13 456763
## - latitude:housing_median_age         1 6.4257e+11 8.2780e+13 456807
## - population:ocean_proximity          3 7.3062e+11 8.2868e+13 456809
## - longitude:median_income             1 7.0672e+11 8.2844e+13 456823
## - latitude:median_income              1 8.0895e+11 8.2946e+13 456849
## - housing_median_age:population       1 1.0220e+12 8.3159e+13 456902
## - latitude:ocean_proximity            3 1.2725e+12 8.3410e+13 456944
## - housing_median_age:households       1 1.4900e+12 8.3627e+13 457017
## - longitude:ocean_proximity           3 2.2268e+12 8.4364e+13 457179
## 
## Step:  AIC=456651.3
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:median_income + 
##     longitude:ocean_proximity + latitude:housing_median_age + 
##     latitude:total_rooms + latitude:total_bedrooms + latitude:population + 
##     latitude:households + latitude:median_income + latitude:ocean_proximity + 
##     housing_median_age:total_rooms + housing_median_age:population + 
##     housing_median_age:households + housing_median_age:median_income + 
##     housing_median_age:ocean_proximity + total_rooms:population + 
##     total_rooms:households + total_rooms:median_income + total_rooms:ocean_proximity + 
##     total_bedrooms:households + population:households + population:median_income + 
##     population:ocean_proximity + households:median_income + median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## - latitude:households                 1 2.1726e+10 8.2178e+13 456647
## <none>                                             8.2157e+13 456651
## - latitude:population                 1 1.1018e+11 8.2267e+13 456669
## - housing_median_age:median_income    1 1.1078e+11 8.2267e+13 456669
## - housing_median_age:ocean_proximity  3 1.9057e+11 8.2347e+13 456669
## - housing_median_age:total_rooms      1 1.1619e+11 8.2273e+13 456671
## - households:median_income            1 1.2251e+11 8.2279e+13 456672
## - total_rooms:median_income           1 1.2384e+11 8.2281e+13 456672
## - population:median_income            1 1.3242e+11 8.2289e+13 456675
## - longitude:total_bedrooms            1 1.7604e+11 8.2333e+13 456686
## - longitude:total_rooms               1 1.8986e+11 8.2347e+13 456689
## - total_rooms:households              1 1.9171e+11 8.2348e+13 456689
## - latitude:total_bedrooms             1 2.5534e+11 8.2412e+13 456705
## - total_rooms:population              1 2.6676e+11 8.2423e+13 456708
## - latitude:total_rooms                1 2.8390e+11 8.2441e+13 456713
## - median_income:ocean_proximity       3 4.2500e+11 8.2582e+13 456728
## - longitude:latitude                  1 3.9509e+11 8.2552e+13 456740
## - total_rooms:ocean_proximity         3 4.7901e+11 8.2636e+13 456741
## - longitude:housing_median_age        1 4.6885e+11 8.2626e+13 456759
## - latitude:housing_median_age         1 6.4334e+11 8.2800e+13 456802
## - population:ocean_proximity          3 7.2775e+11 8.2884e+13 456804
## - population:households               1 6.7935e+11 8.2836e+13 456811
## - total_bedrooms:households           1 6.8621e+11 8.2843e+13 456813
## - longitude:median_income             1 7.0865e+11 8.2865e+13 456819
## - latitude:median_income              1 8.1094e+11 8.2968e+13 456844
## - housing_median_age:population       1 1.0106e+12 8.3167e+13 456894
## - latitude:ocean_proximity            3 1.2720e+12 8.3429e+13 456939
## - housing_median_age:households       1 1.4778e+12 8.3635e+13 457009
## - longitude:ocean_proximity           3 2.2223e+12 8.4379e+13 457172
## 
## Step:  AIC=456646.9
## median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:median_income + 
##     longitude:ocean_proximity + latitude:housing_median_age + 
##     latitude:total_rooms + latitude:total_bedrooms + latitude:population + 
##     latitude:median_income + latitude:ocean_proximity + housing_median_age:total_rooms + 
##     housing_median_age:population + housing_median_age:households + 
##     housing_median_age:median_income + housing_median_age:ocean_proximity + 
##     total_rooms:population + total_rooms:households + total_rooms:median_income + 
##     total_rooms:ocean_proximity + total_bedrooms:households + 
##     population:households + population:median_income + population:ocean_proximity + 
##     households:median_income + median_income:ocean_proximity
## 
##                                      Df  Sum of Sq        RSS    AIC
## <none>                                             8.2178e+13 456647
## - latitude:population                 1 8.9938e+10 8.2268e+13 456659
## - housing_median_age:total_rooms      1 1.0965e+11 8.2288e+13 456664
## - housing_median_age:median_income    1 1.0979e+11 8.2288e+13 456664
## - housing_median_age:ocean_proximity  3 1.9301e+11 8.2371e+13 456665
## - total_rooms:median_income           1 1.1791e+11 8.2296e+13 456667
## - households:median_income            1 1.1897e+11 8.2297e+13 456667
## - population:median_income            1 1.2010e+11 8.2299e+13 456667
## - total_rooms:households              1 1.8960e+11 8.2368e+13 456684
## - longitude:total_bedrooms            1 1.9646e+11 8.2375e+13 456686
## - longitude:total_rooms               1 2.0465e+11 8.2383e+13 456688
## - total_rooms:population              1 2.7003e+11 8.2448e+13 456705
## - latitude:total_rooms                1 3.0622e+11 8.2485e+13 456714
## - median_income:ocean_proximity       3 4.3223e+11 8.2611e+13 456725
## - latitude:total_bedrooms             1 4.0071e+11 8.2579e+13 456737
## - longitude:latitude                  1 4.1726e+11 8.2596e+13 456741
## - total_rooms:ocean_proximity         3 5.0035e+11 8.2679e+13 456742
## - longitude:housing_median_age        1 4.7676e+11 8.2655e+13 456756
## - latitude:housing_median_age         1 6.5598e+11 8.2834e+13 456801
## - total_bedrooms:households           1 6.7859e+11 8.2857e+13 456807
## - population:households               1 6.8453e+11 8.2863e+13 456808
## - population:ocean_proximity          3 7.7513e+11 8.2954e+13 456811
## - longitude:median_income             1 7.1905e+11 8.2897e+13 456817
## - latitude:median_income              1 8.2539e+11 8.3004e+13 456843
## - housing_median_age:population       1 9.9965e+11 8.3178e+13 456886
## - latitude:ocean_proximity            3 1.2632e+12 8.3442e+13 456932
## - housing_median_age:households       1 1.4589e+12 8.3637e+13 457000
## - longitude:ocean_proximity           3 2.2125e+12 8.4391e+13 457165
summary(back_bic_mod)
## 
## Call:
## lm(formula = median_house_value ~ longitude + latitude + housing_median_age + 
##     total_rooms + total_bedrooms + population + households + 
##     median_income + ocean_proximity + longitude:latitude + longitude:housing_median_age + 
##     longitude:total_rooms + longitude:total_bedrooms + longitude:median_income + 
##     longitude:ocean_proximity + latitude:housing_median_age + 
##     latitude:total_rooms + latitude:total_bedrooms + latitude:population + 
##     latitude:median_income + latitude:ocean_proximity + housing_median_age:total_rooms + 
##     housing_median_age:population + housing_median_age:households + 
##     housing_median_age:median_income + housing_median_age:ocean_proximity + 
##     total_rooms:population + total_rooms:households + total_rooms:median_income + 
##     total_rooms:ocean_proximity + total_bedrooms:households + 
##     population:households + population:median_income + population:ocean_proximity + 
##     households:median_income + median_income:ocean_proximity, 
##     data = temp_housing_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -471806  -36852   -8923   25350  616260 
## 
## Coefficients:
##                                                Estimate Std. Error t value
## (Intercept)                                  -7.296e+06  9.152e+05  -7.972
## longitude                                    -5.241e+04  8.032e+03  -6.525
## latitude                                      2.929e+05  2.736e+04  10.705
## housing_median_age                           -7.643e+04  7.648e+03  -9.993
## total_rooms                                   6.752e+02  1.077e+02   6.267
## total_bedrooms                               -2.750e+03  5.218e+02  -5.271
## population                                   -1.033e+02  1.829e+01  -5.651
## households                                   -1.830e+02  1.618e+01 -11.309
## median_income                                -7.382e+05  6.002e+04 -12.300
## ocean_proximityINLAND                         7.288e+05  2.479e+05   2.939
## ocean_proximityNEAR BAY                      -2.423e+07  1.239e+06 -19.561
## ocean_proximityNEAR OCEAN                    -7.422e+05  3.102e+05  -2.392
## longitude:latitude                            2.179e+03  2.131e+02  10.223
## longitude:housing_median_age                 -9.658e+02  8.838e+01 -10.928
## longitude:total_rooms                         8.930e+00  1.247e+00   7.160
## longitude:total_bedrooms                     -4.255e+01  6.066e+00  -7.015
## longitude:median_income                      -9.513e+03  7.088e+02 -13.420
## longitude:ocean_proximityINLAND               1.116e+04  2.949e+03   3.785
## longitude:ocean_proximityNEAR BAY            -2.384e+05  1.092e+04 -21.835
## longitude:ocean_proximityNEAR OCEAN          -5.991e+03  3.701e+03  -1.619
## latitude:housing_median_age                  -1.114e+03  8.688e+01 -12.818
## latitude:total_rooms                          1.101e+01  1.257e+00   8.758
## latitude:total_bedrooms                      -6.141e+01  6.130e+00 -10.019
## latitude:population                           2.513e+00  5.294e-01   4.746
## latitude:median_income                       -1.053e+04  7.320e+02 -14.379
## latitude:ocean_proximityINLAND                1.381e+04  3.079e+03   4.485
## latitude:ocean_proximityNEAR BAY             -1.298e+05  8.441e+03 -15.372
## latitude:ocean_proximityNEAR OCEAN            1.057e+02  3.864e+03   0.027
## housing_median_age:total_rooms               -3.385e-01  6.459e-02  -5.241
## housing_median_age:population                -1.488e+00  9.405e-02 -15.824
## housing_median_age:households                 7.216e+00  3.775e-01  19.116
## housing_median_age:median_income              1.218e+02  2.322e+01   5.244
## housing_median_age:ocean_proximityINLAND      9.414e+02  1.453e+02   6.479
## housing_median_age:ocean_proximityNEAR BAY   -9.824e+01  1.608e+02  -0.611
## housing_median_age:ocean_proximityNEAR OCEAN  2.189e+02  1.324e+02   1.653
## total_rooms:population                       -3.332e-03  4.051e-04  -8.224
## total_rooms:households                        8.135e-03  1.180e-03   6.891
## total_rooms:median_income                     1.553e+00  2.857e-01   5.435
## total_rooms:ocean_proximityINLAND            -1.346e+01  1.331e+00 -10.113
## total_rooms:ocean_proximityNEAR BAY           6.043e+00  2.412e+00   2.505
## total_rooms:ocean_proximityNEAR OCEAN        -2.618e+00  1.491e+00  -1.756
## total_bedrooms:households                    -6.688e-02  5.130e-03 -13.037
## population:households                         2.742e-02  2.094e-03  13.094
## population:median_income                     -3.624e+00  6.607e-01  -5.485
## population:ocean_proximityINLAND              2.963e+01  2.360e+00  12.553
## population:ocean_proximityNEAR BAY           -1.362e+01  4.772e+00  -2.854
## population:ocean_proximityNEAR OCEAN          8.383e+00  2.852e+00   2.939
## households:median_income                      1.382e+01  2.532e+00   5.459
## median_income:ocean_proximityINLAND           1.019e+04  1.058e+03   9.633
## median_income:ocean_proximityNEAR BAY         2.301e+03  1.031e+03   2.232
## median_income:ocean_proximityNEAR OCEAN       3.432e+03  8.250e+02   4.160
##                                              Pr(>|t|)    
## (Intercept)                                  1.64e-15 ***
## longitude                                    6.95e-11 ***
## latitude                                      < 2e-16 ***
## housing_median_age                            < 2e-16 ***
## total_rooms                                  3.75e-10 ***
## total_bedrooms                               1.37e-07 ***
## population                                   1.62e-08 ***
## households                                    < 2e-16 ***
## median_income                                 < 2e-16 ***
## ocean_proximityINLAND                        0.003292 ** 
## ocean_proximityNEAR BAY                       < 2e-16 ***
## ocean_proximityNEAR OCEAN                    0.016756 *  
## longitude:latitude                            < 2e-16 ***
## longitude:housing_median_age                  < 2e-16 ***
## longitude:total_rooms                        8.36e-13 ***
## longitude:total_bedrooms                     2.37e-12 ***
## longitude:median_income                       < 2e-16 ***
## longitude:ocean_proximityINLAND              0.000154 ***
## longitude:ocean_proximityNEAR BAY             < 2e-16 ***
## longitude:ocean_proximityNEAR OCEAN          0.105494    
## latitude:housing_median_age                   < 2e-16 ***
## latitude:total_rooms                          < 2e-16 ***
## latitude:total_bedrooms                       < 2e-16 ***
## latitude:population                          2.09e-06 ***
## latitude:median_income                        < 2e-16 ***
## latitude:ocean_proximityINLAND               7.31e-06 ***
## latitude:ocean_proximityNEAR BAY              < 2e-16 ***
## latitude:ocean_proximityNEAR OCEAN           0.978167    
## housing_median_age:total_rooms               1.61e-07 ***
## housing_median_age:population                 < 2e-16 ***
## housing_median_age:households                 < 2e-16 ***
## housing_median_age:median_income             1.59e-07 ***
## housing_median_age:ocean_proximityINLAND     9.43e-11 ***
## housing_median_age:ocean_proximityNEAR BAY   0.541162    
## housing_median_age:ocean_proximityNEAR OCEAN 0.098260 .  
## total_rooms:population                        < 2e-16 ***
## total_rooms:households                       5.69e-12 ***
## total_rooms:median_income                    5.55e-08 ***
## total_rooms:ocean_proximityINLAND             < 2e-16 ***
## total_rooms:ocean_proximityNEAR BAY          0.012240 *  
## total_rooms:ocean_proximityNEAR OCEAN        0.079153 .  
## total_bedrooms:households                     < 2e-16 ***
## population:households                         < 2e-16 ***
## population:median_income                     4.19e-08 ***
## population:ocean_proximityINLAND              < 2e-16 ***
## population:ocean_proximityNEAR BAY           0.004322 ** 
## population:ocean_proximityNEAR OCEAN         0.003292 ** 
## households:median_income                     4.85e-08 ***
## median_income:ocean_proximityINLAND           < 2e-16 ***
## median_income:ocean_proximityNEAR BAY        0.025656 *  
## median_income:ocean_proximityNEAR OCEAN      3.19e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 63190 on 20584 degrees of freedom
## Multiple R-squared:  0.7008, Adjusted R-squared:  0.7001 
## F-statistic: 964.2 on 50 and 20584 DF,  p-value: < 2.2e-16
#lets use Weighted Least Squares on this model to cover potential heteroskedasticity.

back_bic_mod_fitted = fitted(back_bic_mod)
back_bic_mod_resid = resid(back_bic_mod)

temp_wls_mod = lm(log(back_bic_mod_resid^2) ~ back_bic_mod_fitted + back_bic_mod_fitted^2)
ghat = fitted(temp_wls_mod)
hhat = exp(ghat)

WLS_back_bic_mod = update(back_bic_mod, weights = 1 / hhat)# to do: #figure out how to get call

plot(
      back_bic_mod$fitted.values,
      back_bic_mod$residuals
)

plot(
      WLS_back_bic_mod$fitted.values,
      WLS_back_bic_mod$residuals
)